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ABSTRACT 

This paper introduces an extension of the hnear least-squares (or Lomb-Scargle) pe- 
riodogram for the case when the model of the signal to be detected is non-sinusoidal 
and depends on unknown parameters in a non-linear manner. The attention is paid to 
the problem of estimating the statistical significance of candidate periodicities found 
using such non-linear periodograms. This problem is related to the task of quanti- 
fying the distributions of maximum values of these periodograms. Based on recent 
results in the mathematical theory of extreme values of random field (the generalized 
Rice method), we give a general approach to find handy analytic approximation for 
these distributions. This approximation has the general form P{^/z), where P is 
an algebraic polynomial and z being the periodogram maximum. 

The general tools developed in this paper can be used in a wide variety of as- 
tronomical applications, for instance in the studies of variable stars and extrasolar 
planets. For this goal, we develop and consider in details the so-called von Mises pe- 
riodogram: a specialized non-linear periodogram where the signal is modelled by the 
von Mises periodic function exp(iy cos ujt) . This simple function with an additional 
non-linear parameter v can model lightcurves of many astronomical objects that show 
periodic photometric variability of different nature. We prove that our approach can 
be perfectly applied to this non-linear periodogram. 

We provide a package of auxiliary C-l — I- programs, attached as the online-only 
material. They should faciliate the use of the von Mises periodogram in practice. 
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1 INTRODUCTION 

The lLombI (llQTel l- IScargl^ (|l982l ') (hereafter LS) periodogram 
is a well-known powerful tool, which is widely used to search 
for periodicities in observational data. The main idea used 
in the LS periodogram is to perform a least-squares fit of 
the data with a sinuous (harmonic) model of the signal and 
then to check how much the resulting value of function 
improves for a given signal frequency. The maximum value 
of the LS periodogram (i.e., the maximum decrement in the 
goodness-of-fit measure) corresponds to the most likely 
frequency of the periodic signal. This natural idea is quite 
easy to implement in numerical calculations. The linearity 
of the harmonic model with respect to unknown parame- 
ters (two coefficients near the sine and cosine) introduces 
additional simplifications. 



* E-mail: roman@astro.spbu.ru 



Any signal detecting tool is not of much use without 
accompanying method of estimating the statistical signifi- 
cance of candidate periodicities. Indeed, the random errors 
contaminating the input data inspire noise fluctuations on 
the periodogram, so that we can never be completely sure 
that the peak that we actually observed is a result of real 
periodicity in the data. To assess the statistical significance 
of the observed periodogram peak, we need to calculate the 
'false alarm probability' (hereafter FAP) associated with this 
peak. The FAP is the probability that the observed or larger 
periodogram peak could be produced by random measure- 
ment errors. The smaller is FAP, the larger is the statistical 
significance. Given some small tolerance value FAP, (say, 
1%), we could claim that the detected candidate periodic- 
ity is statisticaly significant (when FAP < FAP*) or is not 
(when FAP > FAP,). 

From the statistical view point, the FAP is tightly con- 
nected with the probability distributions of the periodogram 
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considered under the null hypothesis (i.e., the hypothesis of 
no signal in the data) . If the frequency of the putative sig- 
nal was known a priori, we could use only single value of 
the LS periodogram to check whether the presence of this 
periodicity is likely or not. In this case, the FAP is given by 
the well-known exponential distribution of any single value 
z of the LS periodogram, so that FAP = e~^. However, the 
case in which the frequency of possible signal is basically 
unknown is much more common. In this case, the FAP is 
provided by the distribution function of the maximum value 
of the periodogram (corresponding to the frequency range 
being scanned). 

The calculation of the latter distribution is a non- 
trivial task. The absence of accurate and/or rigorous an- 
alytic expression of this distribution (even for the plain 
LS periodogram) represented a significant trouble for as- 
tronomers for about three decades. In addition to the 
Lomb and Scar gle works, it is wor t hwile to ment io n her e 
the papers bv iHorne fc Baliunad (1 19861): iKoeij 
Schwarzenberg-Czernvl lll998al): Gumming et al.l 
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Cummind l|2004l ): iFrescura et all l|2008l ). We beheve that 



this obstacle was the main reason why basically no intricate 
extensions of the LS periodogram attained enough practical 
popularity so far. Theoretically, it is not really difficult to 
construct a periodogram where some fancy models of the 
data are used. Armed with modern computers, we even may 
evaluate such periodograms in practice, even if they rely 
on some CPU-greedy numerical algorithms. But what to do 
next? How to decide which of the signals detected are real 
and which belong to the noise? The only general solution 
available is the Monte Carlo simulation technique, which 
might be practically useful for the basic LS periodogram, 
but not for more complicated cases, unfortunately. 

Rather recently, a significant progress in this field was 
attained in the paper (Balucv 2003), where entirely analytic 
and simultaneously accurate approximations of the FAP are 
given, based on the results in the theory of extreme values 
of stochastic processes (the ' Rice method') . In a brief form, 
the main result presented in (|Baluevl|2008l ) for the LS peri- 
odogram is: 



FAP(2) < M{z) ^ We"'^^, 



(1) 



where z is the maximum periodogram value corresponding 
to a given frequency range, and W is the width of this range 
multiplied by a certain effective length of the time series 
(which is usually close to the plain time span). The symbol 
'<' in IT} means that FAP(2:) will never exceed M{z) and 
simulataneously M{z) represents an asymptotic approxima- 
tion for FAP(2), with the error decreasing for small FAP 
(or large z). The high practical importance of the approx- 
imation in IT} is founded on three things: (i) it is entirely 
analytic, eliminating any need for Monte Carlo simulations, 
(ii) its practical accuracy usually appears good or at least 
quite satifactory, and (iii) its possible errors never favour 
to more false alarms than we expect, since we deal with an 
upper limit on FAP. 

The LS periodogram can be easily generalized 
in multi ple ways to encornpass m or e comp l icated 
models (ISchwarzenberg-Czernv| Il998allbl: iBaluevI l2008l : 
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IZechmeister fc Kiirste il l2009l : iFerraz-MellJfigSlT " First, we 
can introduce some base model of an expected underlying 
variation (typically a long-term polynomial trend) and 



check whether the addition of a probe sinuos signal offers 
enough improv ement in y^. These cases have been briefly 
considered in (|Baluevl [gOOSl) and our general conclusion 
was that such a modification does not typically break the 
result |T}. Second, we can deal with more complicated 
(but still linea r) mode ls than just a sinusoid. In particular, 
in the work (|Baluevl [2009b) we considered the so-called 
multi-harmonic periodograms, where the periodic signal is 
modelled by a trigonometric polyn omial involving a few 
leadi ng terms of the Fourier series ([Schwarzenberg-Czernvl 
Il996h . In this case, the formula |T} is generalized to 



FAP(2) < M(z) ^ Wane 



-z n — 1/2 



(2) 



where are certain numbers depending on the degree n 
of the approximating trigonometric polynomial. Notice that 
n = 1 corresponds to the LS case. 

However, non-sinuous periodic signals, which are dealt 
with in astronomy, often obey non-linear models. T hen the 
use of the LS periodogram or periodograms from (iBaluevI 
[2008 . .2009h ) is not optimal, since the corresponding peri- 
odic variation might be fitted by an inadequate model. For 
instance, this is the case for lightcurves of variable stars 
and for radial velocity curves of spectral binaries involving 
elongated orbits. Theoretically, we could use a high-order 
Fourier expansion to approximate a non-sinusoidal period- 
icity, but this solution is obviously inefficient due to an un- 
necessarily large number of extra free parameters. The aim 
of th e pres ent paper is to extend the results from ([Baluevl 
(200^ and (lBaluevll2009bl ) to the case of an arbitrary model 
of the periodic signal, incorporating a few parameters in 
a non-linear manner. As we will demonstrate, we can ap- 
ply roughly the same technique (the Rice method) to this 
case, with the major difference that we should now deal 
with random fields instead of random processes. Namely, we 
will provide a closed approach to construct the limiting ap- 
proximation M{z) in the form We~^P{^fz), where P is an 
algebraic polynomial. 

The structure of the paper is as follows. In Section[2l we 
introduce a general definition extending the LS periodogram 
to the non-linear case. In Section^ we consider the problem 
of assessing the statistical significance of candidate periodic- 
ities detected with the non-linear periodogram. This descrip- 
tion is followed by an auxiliary Section|3]devoted to the ways 
of practical evaluation of the theoretical approximations of 
Section [3] In Section (5] we discuss the concequences implied 
by various noise models of the data. In Section |6l a couple of 
concrete practical applications of these results is supplied. 
In the first (rather tutorial) example, we aim to detect a 
periodic signal of arbitrary (but a priori fixed) shape, when 
the unknown parameters are the amplitude and the phase 
of the signal. In the second example, we consider a more 
complicated periodogram based on the so-called von Mises 
model of the signal, essentially exp(i/cosa;), which involves 
an additional non-linear parameter v. 



2 DEFINITION OF THE NON-LINEAR 
PERIODOGRAM 

Let Xi denote the outcomes of A'' observations made at tim- 
ings ti. The errors of these measurements are assumed to 
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follow Gaussian distributions and to be statistically inde- 
pendent (hence, uncorrelated). The standard deviations of 
these errors, ai, are assumed to be known a priori. We want 
to test, whether these observations are consistent with some 
base model of variation, or certain deterministic periodicity 
is also present. 

The data model to be tested for consistency with the 
data is fi-H{t,0n), where the vector Gn incorporates dn 
unknown parameters, which should be estimated from the 
data. We assume that this model is linear with respect to 
unknown parameters: 



(3) 



where the vector of base functions tfi-n (t) is set a priori. Typ- 
ically, the model nn incorporates a free constant term, and, 
possibly, a long-term polynomial trend with free coefficients. 
The Lomb-Scargle periodogram, by the way, assumes that 
H-u = 0, implicitly requsting some preliminary centering of 
the time series. 

The model of the periodic signal is given by 6, /), 
where the vector G contains d unknown parameters to be 
estimated from the data together with the frequency /. The 
united vector O/c = {67.^, 0} parametrizes the compound al- 
ternative model of the datcU, which is given by 



fiK:it, 0k:, f) = 0h) + 0, /)• 



(4) 



We denote the d-dimensional domain, where 6 is sup- 
posed to reside, by 0. The signal is supposed to vanish when 
belongs to some 'null domain' Oo C O and not to vanish 
when does not belong to Oo- Therefore, we wish to test, 
whether the data are consistent with the base hypothesis 
H : £ Q>o (implying that the model unit, 0n) fits the data 
satisfactory) or this base hypothesis should be rejected in 
favour of the alternative K. : £ Q\Qo (implying the model 
l-i-ici't,0K, /))■ The model /i may be non- linear with respect 
to 0. 

The unknowns On, 0, and / can be estimated using the 
least-squares approach. Under the hypothesis H, the best- 
fitting estimation of 0h can be obtained in result of min- 
imizing the functions Xni^n) = {{^ " fJ-n)^)- Under the 
hypothesis IC, the best-fitting estimations of 0n,0 and / 
should correspond to the minimum value of the function 
Xic{0ic,.f) ~ {{x — /iK:)^)- Since fin is linear, the minimiza- 
tions by 0^1 can be performed rapidly and precisely using 
the usual linear least-squares algorithms. The minimization 
of xk the remaining variables is equivalent to the max- 
imization of the non-linear function 



minx-H(0Ti) -mmxici0n,0,f) 



(5) 



which simultaneously characterises the improvement in the 
fit quality, which is achieved by means of adding to the 
base model the model of the periodic signal with given values 
of and /. Note that the maxima of (^{0, /) do not depend 



^ We use here all notation conventions used in llBaluevll2008l) . For 
instance, the braces {*,*,...} denote the association of vectorial 
or scalar arguments into a single vector, the angular brackets 
(*) denote the summation of the argument over timings ti with 
weights 1/o'f, and x ^ y = xy^ is the dyadic product of the 
vectors x and y. 



on the choice of the parametrization. That is, they are in- 
variable with respect to a non-degenerated transformation 
of the vector (and any non-degenerated linear transforma- 
tion of 0n)- 

We can perform the minimization over the frequency / 
in a traditional manner by means of looking for the highest 
peak on the graph of the function 

<f)=rnaxC(0,./), (6) 

which may be called the "non-linear least-squares peri- 
odogram". This definition means that for any fixed fre- 
quency / we perform the fit of our model via the remaining 
d parameters 0. The value of the periodogram characterizes 
the relevant advance in the x^ fit quality. When the model 
fi{t, 0, f) is linear with respect to 0, this definition of z{f) 
coincides with t he definition o f the linear least-squares pe- 
riodogram from (|Baluevll2008l ). 

In the majority of practical applications, one of the pa- 
rameters in is the amplitude K of the periodic variation. 
This means that 



fji{t,0,f)=Kh{t,i,f), 



(7) 



where the vector ^ contains d — 1 remairnng unknown pa- 
rameters of the signal. We assume that ^ belongs to some 
domain H in d — 1 dimensions, so that the domain B repre- 
sents the Cartesian product [0, 4-co) x H or (-co, -f 00) x H, 
and the null domain is the domain of zero amplitude: 
Oo = {K = 0} X H. For simplicity, let us firstly consider 
the case when d-u = and the hypothesis H states that 
the data do not contain anything but the white Gaussian 
noise. In this case, = {x^) and C,{0, f) — (xfi) — (^^) /2 — 
{xh)K — {h'^)K^ /2 is a quadratic polynomial of K, which 
can be easily maximized given fixed / and ^. This results 
in a least-squares estimation K* — {xh)/{h?), and in the 
maximum maxjy = rf^ /2, where ri{$,, f) represents the new 
function to be maximized by the remaining parameters. It 
can be expressed as 



(8) 



where ^i,/) = h{t,^, f)/^/Jh?). 

A similar result may be obtained for the case when the 
relation ((TJ is still valid, but the model is no longer 
empty. It is not hard to check that, if the models fi and fin 
were orthogonal in the sense that {hipn} ~ 0, the maximum 
of ( hy K could be expressed exactly in the same way as 
it was described in the previous paragraph. In the general 
case the models are not orthogonal, and we introduce the 
new model function 



(9) 



where Q6)„,e^ ~ {fn ® 'fin) is the d-u x d-u Fisher infor- 
mation matrix associated with 0n, and Qe„,x = ivnh) is 
the d-H X 1 Fisher information matrix for the parameters 0-h 
and K. Since the identity {hipn) ~ holds true, the new 
model of the signal is orthogonal to the base model. Now h 
should replace h in the expression for ^, so that 



fnit) 



(10) 



After that, we can directly calculate the quantity 77 from the 
equation (|8}. Finally, 



4 R.V. Baluev 



mEixC = 77/2 

K 

with 



(11) 



(12) 



The best-fitting values of ^ and / correspond to the maxi- 
mum of 77. 

Often it might be useful to assume that negative val- 
ues for K are not allowed. Then we should make a small 
amendment to the last formulae Ulip and ([12]). Namely, they 
can be used only for > 0, while for < we should set 
max_R- C = and K* = with the best-fitting values of ^ 
and / undefined. 

The formulae become more simple for an important 
practical case when d-u = 1 and ipn = 1 reflects a free 
constant offset of the data. In this case, let us first define 

x, = {x)/{l), K{iJ)^{h)/{l), D = {{h-h,f), (13) 

and then evaluate 

rj = {{x--x,){h^hc))/VD, maxC = r?V2, K^Tj/^^.{U) 

Note that the quantity (1) represents the sum of weights of 
all observations. 



3 APPROXIMATING THE FALSE ALARM 

PROBABILITY USING THE RICE METHOD 

3.1 General introduction to the problem 

In this paper we are interested in the false alarm proba- 
bility (FAP) associated with the observed maximum peak 
-Zmax = rnaxos5/s5/„a^ z{f), where /max is some a priori given 
maximum frequency. This false alarm probability can be for- 
mally defined as follows: 



Pr{30 G e, / G [0, /max] : C(0, /) > ^max}, (15) 



again with the probability operator calculated under the null 
hypothesis (no actual signal in the data) . We can see that to 
assess the FAP, we should know the distribution function of 
the maximum values of C,{0,f). This function represents a 
real-valued random field defined on a domain of dimension 
d+1. 

One could claim that since the model of the extra sig- 
nal contains d free parameters and since the quantity z{f) 
is the logarithm of the likelihood ratio statistic, the dis- 
tribution of 2z{f) (for a fixed /) should tend to the 
distribution with d degrees of freedom, when the number 
of observa t ions g rows. This was assumed, for instance, by 
ICummin3 l|2004 ) who considered the case of Keplerian ve- 
locity variation with four unknown parameters (plus the 
period). We must caution the reader that in general this 
assumption is incorrect because the conditions of the corre- 
sponding limiting theorem are not satisfied. The most im- 
portant reason comes from the fact that the parameters ^ 
have no physical sense (are undefined) when K = 0. Speak- 
ing mathematically, they are not identifiable for K = 0. The 
lack of identifiability under the null hypothesis usually de- 
stroys the usual a symptotic properties of the likeliho od ra- 
tio (and x^) tests I Dacunha-Castelle fc Gassiatlll999t ). This 
is because we typically just cannot construct a valid Tay- 
lor series of the signal model at K = 0, except for rare 



special cases. Without that we cannot linearize this model 
under the null hypothesis, which is critical for the valid- 
ity of the asymtotic distribution. An exception is pro- 
vided, for instance, by the LS periodogram with the har- 
monic model of the signal. In this special case, we are able to 
perform the following re-parametrization: K cos{27vft+X) = 
acos{2nft) + bsin{2nft). While that phase A was a not iden- 
tifiable at iC" = 0, the new parameters a and b are already 
identifiable and even linear. In this case, the distribution 
of each single value of 2z{f) is indeed the x^ one with two 
degrees of freedom. Unfortunately any similar trick is not 
possible for the majority of the other models, even appar- 
ently simple ones. 

The things get even worse for the more practical case 
when / is also unknown. In this case the frequency, trated 
as a new free parameter, is not identifiable (at K = 0) even 
for the LS periodogram. Actually, we may note that non- 
identifiability of the frequency is the primary obstacle that 
made the treatment of the significance levels of the LS pe- 
riodogram so difficult and non-rigorous over decades. The 
LS periodogram of the noise containes an infinite sequence 
of similar noisy narrow peaks, but none of them can serve 
as a reference position for a quadratic Taylor approximation 
that would be valid in the whole frequency range. If not 
that obstacle then we could just use the chi^ distribution 
with three degrees of freedom (two for a and b plus one for 
/) to approximate the n ecessary distr ibution of the LS pe- 
riodogram. However, in (|Baluevl 120081 ) we managed to deal 
with this obstacle using the so-called 'Rice method', treating 
the noisy LS periodogram as a random process depending 
on a real argument /, which was a single non-linear param- 
eter of the model. The case of non-linear periodograms just 
adds more non-linear arguments of but the issue of their 
non-identifiability a,t K — remains qualitatively the same. 
Therefore, we may try to treat this more general situation 
using the same or similar method. 

Of course, it is hardly possible to derive an exact ex- 
pression for FAP, but we would be pretty satisfied if we find 
at least an approximation analogous to what we obtained 
in our previous works. Namely, we aim to obtain something 
like 



FAP(2max) < M{Zn 



(16) 



where M represents simultaneously an upper bound for FAP 
and its more or less good asymptotic approximation for large 

^max 

(smaU FAP). 

3.2 Basic ideas of the Rice method 

The modern comprehensive theory of the Rice method and 
relevant topics can be found in the reviews l|Kratzl [2OO6I : 
lAz ais fc Wscheboi|[2009l ). Here we present only a very brief 
extraction of the results that are most useful in our present 
paper. Suppose we deal with some arbitrary random pro- 
cess or field Z{x) and we need to find the probability that 
its maximum (within some domain a; G X) will lie beyond 
a specified level Z{x) = z. In our signal detection task this 
probability is equal to FAP{z), and this is obviously a com- 
plementary probability to the distribution function of the 
maximum of Z. The general Rice method to estimate these 
thing is based on two main points. First, we should construct 
some derived integer random variable N(z), such that the 



Detecting non-sinusoidal periodicities 5 



event Af{z) = is equivalent (or almost equivalent) to the 
event {Z{x) < zWx £ X}, and the event M{z) 1 is (al- 
most) equivalent to {3a; £ X : Z(x) > z\. The boundary 
event when Z{x) 5C z everywhere in X and there is only one 
or a few x such that Z{x) — z should usually correspond to 
Miz) — 1. The word "almost" refers here only to the effects 
at the boundary of X (the boundary maxima); if we some- 
how knew for sure that boundary maxima are impossible 
than this word can be just omitted. 

For random processes a good choice for A/" is the number 
of up-crossings of the specified level; i.e. the number of points 
X such that Z{x) = z and Z'{x) > 0. For random fields 
the term of up-crossing is meaningless, and in this case we 
choose Af{z) to be the number of local maxima beyond z 
(and inside X), that is the number of points x where Z > z, 
Z' = 0, and Z" is negative-definite. 

Given such a counter variable A/", we can estimate the 
required false alarm probability, i.e. the probability for Z{x) 
to exceed a given level z somewhere in X, as 

FAP(z) < M{z) = Mboundary(z) + t{z), 

t{z)=EU{z). (17) 

Here the term Mboundary refers to the maxima attained at 
the boundary of X; it may or may not be neglected, depend- 
ing on other conditions of the task. We will discuss it in 
detail later. The primary term is r(z), which is equal to the 
mathematical expectation of the selected counter. This for- 
mula is basically the same as (|16p with concretized function 
M{z). 

The second point of the Rice method is the generalized 
Rice formula for r. For the random processes we have 

t(2) = jEi[Z'{x)]+\Z{x) = z)pz{z)dx, (18) 

X 

where the functions p stand for the probability density func- 
tions of the quantity Z{x) shown as indices (this p also de- 
pends on x), and [a]+ = max(0, a). If necessary, the expres- 
sion (jlSp can be obviously rewritten in terms of the joint dis- 
tribution of Z{x) and Z'{x), see (Balu cv 2008). The name 
"Rice method" and "Rice formula" are after iRicel (|l943 ). 
who constructed his original Rice formula for the stationary 
Gaussian random process. 

In this paper we will use the generalized Rice formula 
for random fields. Actually, now we have ev en three formulae 
of th at type. The first one is introduced in jAzai's fc Delmaj 
it can be written down as 

oo 

t{z) = j dZ j E(det[Z"]_ I Z' = Q-Z)pz.z'{Z,Q)dx, (19) 

2 X 

where det[.Z"]_ is equal to |detZ"| when Z" is negative- 
definite, and zero otherwise. This formula is generally 
similar to (|18p. al t hough considerably more complicated. 
lAzais fc WscheboJ (|2009l ) introduced in their Chapter 8 a 
variation of ([19| with det[Z"]_ replaced by |detZ"!. Such 
replacement obviously somewhat increases the right-hand 
side of (|19p . keeping its upper-limit property, but making the 
computations a bit more easy. It counts all the critical points 
of Z{x) above z, not just the local maxima. However, both 
these formulae are usually too difficult for computations, 
and the formula that is typically used in practice contains 



just the "naked" Aet Z" instead of |detZ"| or Aet[Z"\-. 
Such a formula gives the mathematical expectation of the 
Euler-Poincare characteristic (EPC) of the level-section set 
{a; G X : Z(x) > z} (which is also called as the "excursion 
set"). 

Unfortunately, the quantity E(EPC) does not strictly 
retain the upper-limit property of EA/". However, it is 
known (at least fo r Gaussian fields, see e.g. Chapter 8 by 
lAzai's fc Wscheboil 12009,)) that for large levels z the quan- 
tities EA/" and E(EPC) are asymptotically equivalent, and 
their difference decreases rather quickly (we will detail this 
below). This is because beyond a large z all critical points of 
Z are local maxima with almost unit probability, so the rel- 
evant excursion set represents a number of (filled) ellipsoids 
encompassing the positions of these maxima. Each such a 
filled ellipsoid has EPC = 1, and thus EPC ~ A/" for large 
z. All this means that we can typically use E(EPC) as a 
good practical approximation for r. Even if E(EPC) does 
not provide an entirely strict upper bound, the relevant er- 
rors usually appear negligible for practical levels of z. Of 
course we must admit that "usually" or "typically" is not 
the same as "always", but nonetheless this approximation 
appears quite satisfactory in the examples considered below 
in the paper, as well as in a few other cases that we prepare 
for a future publication. 

The Rice method usually provides good practical accu- 
racy, so that the mentioned upper bound p6|l appears close 
to the actual value of FAP, at least for practically important 
case of small FAP levels. The Rice method does not belong 
to widely-known methods, because it is not mentioned in a 
typical handbook on mathematical statistics. Therefore, its 
usage in applications (e.g. in astronomy) is rare. However, 
rare does not mean absent: we found that some variant of 
this method was applied bv lBardeen et aO l|l986t ) to study 
cosmological density fluctuations, which were modelled by a 
Gaussian random field. 

3.3 Applying the Rice method to non-linear 
periodograms 

Mathematically, the condition of max(^ ^ z is equivalent to 
that of max \ri\ ^ \/2z (case of arbitrary K) or maxr; ^ y/2z 
(case oi K ^ 0). Therefore, we need to calculate the dis- 
tribution of the maximum values of the random function 
r;(^, /) to estimate the FAP. In this subsection we limit our- 
self by the single-sided case K ^ Q, bearing in mind that to 
obtain the formulae for the case of arbitrary K we need to 
double the right-hand side of HZ}, because then we need to 
honour the maxima of rj above \/2z as well as its minima 
below —\/2z, which are entirely analogous. 

The random field r;(^, /) possesses quite simple statisti- 
cal properties. From the definition ((Sj it clearly follows that 
if the noise in our observations is Gaussian, t) represents a 
Gaussian random field. It is easy to check from ((Sjl and (|10|l 
that Ery = and the varian ce P?7 = 1. This p l aces u s in the 
framework of Theorem 1 bv lAzai's fc DelmasI (|2002l ) . In this 
case we can use (|17p with 

[n/21 

r{z) ^ E(EPC) = a,Pn+i-2j{z). (20) 
In this relation, the integer n represents the number of free 
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arguments of rj. It is equal to dim^ = d — 1 or dim^ + 1 ~ d, 
depending on whether we consider the case of fixed or free 
frequency /. Below we will only consider the more practical 
second case with n — d, but in order to avoid misunder- 
standings we prefer to keep different notations for n and d. 
The notation [*] stands for the integer part of the argument. 
The functions Pk (z) represent the tail probability associated 
with the distribution with k degrees of freedom: 



Pk{z) = 



r(fc/2) 



fe/2-l 



^dx. 



(21) 



Fn(z) 1 
0.1 
0.01 
0.001 
1e-04 
1e-05 
1e-06 
1e-07 
1e-08 
1e-09 



10 



15 



F5 
F4 
F3 
F2 
F1 
20 



When z oo, the error of the approximation in (|20| 
has the decrease rate quicker than e"'^"'"''-'^ with some posi 



Figure 1. The graphs of several functions Fn(z). 



^("-l)/2 



This 



five 5, while r itself typically decreases as e 
means that the relevant relative error decreases qucker than 

g-.«/^(n-l)/2^ 

We can see that the expression (|17l) involves a linear 
combination of tails with different numbers of degrees of 
freedom. However, the sum of the coefficients aj is not nec- 
essary unit, so that the expansion in \n\ is not a mixture 
of distributions in the rigorous meaning of this notion. For 
large z, the first term with Pn+i ~ z^"~^'^^^e~'^ dominates, 
whereas the relative magnitudes of the remaining terms de- 
crease as ~ 1/z-' . 

The calculation of the coefficients aj represents a large 
technical difficulty. These coefficients are pr oportional to the 
quant ities fc2j introduced in Theorem 1 bv lAzai's fc Delmaa 
l|2002ll . We altered the original coefficients k2j in order to 
have the functions Pk explicitly in the sum (|20p . 

Let us denote the variance-covariance matrix of the gra- 
dient of as G (it is denoted as A by lAzai's fc Delmaa 
l|2002l )'). It can be easily calculated from the eq. ([8]). The 
part of the matrix G corresponding to only the parameters 
^ is equal to 



dtb dib 



(22) 



and the remaining elements due to the frequency parameter 
can be expressed in an entirely analogous manner. 

At first, let us consider a simplified situation when G 
does not depend on $ and / . Then we c an use the proposition 
"a" of Theorem 1 in (|Azais fc Delmasii2002 '). We find that in 
this case the coeffiecients Uj (or k2j) are prop ortional to the 
coeffi cients of the Hermite polynomials Ha (|Korn fc KornI 
1 19681 . §21.7), so that after some elementary transformations 
we have 



r ^ AF„(z) 
with 



(23) 



A = 



VdetG 

OO 



OO 

dx 



(24) 



The last equality in (I24|) can be just checked by direct differ- 
entiation or it can be derived "honestly" using the Rodrigues 
representation for Hn- Notice that Hn are normalised here 
so that their highest coefficients are equal to unit and the 

2 

weighting function is . 



Let us write down a few of the functions _F, 
Fl(2) = e-^ F2{z) = e-'-/l, Fz(z)=e 



1 

'-2 



F^z) 



/i, Fs{z) = e' 



2 o , 3 

2 — ,iZ + - 

4 



(25) 



We plot the graphs of these functions in Fig. [T] Also, a no- 
table recursive relation F„+i{z) — —F^(z)^/z can be used. 

When G is non-constant, which is a more practical case, 
we can easily calculate only the primary term in (|20l) . We 
have in this case 

r ^ yle-"^*"-''''^ (26) 

with the relative error of ~ 1/z, which is worse than the 
error of (1201). Here 



A 



27r("+i)/2 



df / VdetGd^, 



(27) 



which is an evident generalization of A from (|24|l . 

The remaining z-power terms are different from the case 
of constant G, and usually they are very hard to evaluate, 
because they involve conditional covariances of the second- 
order derivatives of 77 in quite unpleasant combinations (see 
proposition "b" of Theorem 1 by Azais & Dclmas 200^). It 
might be noticed that for a simple case n = 1 we have only 
one term in (I20II. 



3.4 The role of the boundary effects 

The quantity r on itself does not yet provide a closed so- 
lution of the problem. According to (|17|l . we need to assess 
a similar term, which is related to the number of the so- 
called "boundary maxima" (which are the local maxima of 
the random field restricted to the domain X boundary). 

This term takes into account the situation when all lo- 
cal maxima in the domain interior appear smaller than some 
boundary maximum. It can be calculated using very simi- 
larly to the term t, we just need to consider the restriction 
of our task to this boundary and apply the Rice method 
in a recursive manner. We should approximate the quantity 
Afboundary by a formula similar to p7p and (|2Up . but with n 
replaced by n — 1. Hence, its relative contribution decreases 
for large 2, but at a rather slow rate of ~ Ij ^/z. 

We however must take care of one small but rather im- 
portant thing. When we restrict our field r\ to the bound- 
ary of our domain, a maximum at the boundary does not 
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necessarily represent a good candidate for the global maxi- 
mum. Whether a particular boundary maximum is "good" 
or "bad", depends on the sign of the derivative of rj in the 
direction of the outward normal to the boundary (i.e., the 
projection of the gradient -q' to this normal). If it is nega- 
tive, we can definitely find larger values of rj when stepping 
from the boundary inwards. Thus, such a boundary maxi- 
mum can never provide the global maximum of 77, so we call 
it "bad". When counting the boundary maxima, we must 
filter out all "bad" ones, keeping only the "good", which 
offer positive outward derivative. Mathematically, all this 
requests from us to replace in p9p the gradient Z' and the 
Hessian Z" by their projections to the tangent plane to the 
boundary, (denote them, say, and Z"), and consider the 
operator E and the probability density Pz,z'^^ conditionally 
to an additional constraint Z'j_ > (where index _L means 
the projection on the outward normal to the boundary sur- 
face) . 

In practice this usually just means that we need to halve 
(precisely or approximately) the estimated number of the 
boundary maxima, rboundary. We must admit that this is- 
sue has not been investi g ated in the literature with enough 
details. lAzai's fc Delmad (|2002l ) prove this "1/2-rule" under 
certain restrictive assumptions, among which the most im- 
portant is the requirement of constant G. From their Theo- 
rem 3 it follows, basically, that 



.^^boundary — ^ '^boundary ■ 



(28) 



where Tboundary cau be evaluated in essentially the same 
manner as r, considering the restriction of the task to the 
boundary surface (which implies, in particular, a decrease in 
n), and under ". . ." we mean here some ter ms having faster 
decrea se rate than Tboundary. Although lAzai's fc DelmasI 
(|2002l ) leave the case when G 7^ const aside, after inves- 
tigation of their detailed proofs, we do not find an obsta- 
cle in generalizing their single-term asymptotic formula for 
^boundary to a more general case with G 7^ const. Moreover, 
we find that the main neglected term contained in ". . ." 
of (|28|l has the relative magnitude of ~ 1/z. 

Since in the general case we anyway keep only the great- 
est terms in r and similar quantities, we can use a two-term 
approximation like 



FAP(z) < M{z) 



. (n-l)/2 1 . n/2- 
■^■^ \ -^boundary 



.(29) 



The principal error of the right-most expression has the rel- 
ative magnitude of ~ 1/z and is due to the omitted terms in 
T, while the omitted terms in Tboundary are of even a smaller 
order ~ 1/z^^^. 

In more complicated cases, the boundary itself may 
be non-smooth due to "sub-boundaries" of smaller dimen- 
sion (edges, vertexes), which will generate extra terms in 
A/boundary. It is rather difficult to formulate a simple and 
general receipt of how to deal them with, because the geom- 
etry of the boundary might be quite complicated in general. 
However, later we explain this procedure on a concrete ex- 
ample of the von Mises periodogram, when the parametric 
domain is a rectangle. 



4 EVALUATING THE COEFFICIENT A 

4.1 Assuming the uniform phase coverage 

It would be useful to construct the analytic expressions of 
the FAP for the case when our observations are distributed 
approximately uniformly in time. We assume that the tim- 
ings ti, when they are phased to some frequency /, cover 
the relevant phase more or less uniformly. In this case, we 
can approximate the summation {*) over the time series 
by means of integration over its time span [ti,t2]. Saying 
it more accurately, we approximate the time-series average 
{*)/{l) by the integral average over the time span. More- 
over, the periodic character of the model fi usually allows for 
this integration to be performed over a single period only. 
The periodicity of h implies that h{t, ^, /) = g{2TTft -I- A, i^), 
where the function g{x,v) is 27r-periodic in x and A is the 
phase parameter. The remaining d — 2 parameters form 
the vector v. This means that H = [0, 2%] x T and = 
[0, +00) X [0, 27r] X T, where T is some [d — 2)-dimensional 
domain of parameters v. Now we can approximate, for in- 
stance, the mean value of h as 



g(x,v)dx = wg, 



(30) 



where P = 1// is the period of the signal, w = (1) is the 
sum of the weights and the overline denotes the continuous 
averaging over the periodic variable x. Note that usually we 
deal with the case when the base model fin incorporates a 
constant term. According to (|10p . this means that we should 
first subtract the constant (g) /w « 'g from the model of the 
signal. We will assume that g was already centered. 

The same arguments lead to the equality {h^) ~ wg'^. 
The latter expression does not depend on / and A, so we 
can write down the derivatives of ip as 

2TTt g'^ dip 

y "^y V "'^ V "'^ V 

where the function g and its derivatives g'^ = dg/dx and 
gl. = dg/dvi outside the averaging are calculated for x = 
2-Kft -\- A. This allows us to calculate the elements of the 
matrix G: 




dip_ 
d\ 

dip dip 
d\ dui 

dip dip 
dvi dun 



g^glj gg'x ggl. 



gl.gl, gg'u,ggii 



(32) 



which do not depend on the frequency and phase. When 
calculating the elements of the matrix G, we also deal with 
summations like, for instance, 



diP^ 
df 



wg^ 



t^g',\27vft + \)y 



(33) 



The P-periodic function g'^ can be expanded in t he Fourier 
series with the constant term being equal to g'^^. If ti span 
a large enough number of the periods approximately uni- 
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formly, the summation in (|33|) averages out all Fourier har- 
monics, except for the constant term, which results in 



Of 



(34) 



Here, the line over denotes the weighted averaging of the 
squared timings: = {t^) / (1), which can be easily evaluated 
directly (without approximating it by a continuous integral). 
Similar arguments lead to 



dip dtp \ 

dfdx/ 



2ntq, 



I dtp dtp 



2TTtVi 



(35) 



where t is the weighted average of ti. Thus, the full matrix 
G can be written in the following block form: 



2Tvtq 

2TVtV 



2-n-tq 

q 

V 



2TVtv' 

T 

V 

V 



(36) 



where the elements of the vector v and those of the matrix 
V are defined in (|32[1 . In this approximation, the matrix G 
only depends on the parameters v. Using simple elementary 
transformations of G we can finally obtain that 

det G « TTq^T^tf det R, (37) 

where R — \l —v®v / q (that is, Rij — Vij—ViVj/q) and Tefi = 

^ 47r {t-^ — is the effective length of the time series, as 

it was defined in l|Baluevll2008l '). Integrating ((27)) over A and 
substituting (|37|l . we can write down: 

A ^ J gVdetRdi/, W = U.^na- (38) 



The factor A can now be substituted in (|23|) for the use 
in p7|l and (|16[l. In the degenerate case n = 2, we have 
dimi/ — and put det R = 1 by definition. 

The main advantage of this method of calculation of A 
is that the result depends on a particular time series only 
via the single quantity T^s. Given the model g and the do- 
main T, we can evaluate an approximation for A once and 
then use it for all time series, substituting the proper values 
of /max and Teff. For the LS periodogram, fo r instance, w e 
have Aai W, which is in full agreement with (|Baluevll20di ). 
The main disadvantage is that this approximation may have 
insufficient accuracy in practice. 

When deriving this approximation for A, we assumed 
the uniform phase coverage for all frequencies / in the scan 
range. When the original time series is not uniform, this 
assumption may become invalid at some frequencies /, cor- 
responding to periodic leakage patterns of ti (and including 
also the zero frequency). However, these perturbations can 
usually appear only inside very short frequency segments 
(A/ ~ l/Tcfi) around the leakage frequencies; for the most 
values of / in the range [0, /max] the phase coverage is still 
uniform. But our formula for A in (|27|) involves an inte- 
gration over the wide frequency band A/ ~ W/TcB with 
W ^ 1, and hence the perturbations of its integrand have 
almost no effect on the result, because they are limited by 
so short frequency segments. This means that the accuracy 
of our approximation for A does not significantly degrades 
even for non-uniform time ser ies. For sinus oidal signals this 
was already demonstrated in l|Baluevll200"i ). where we have 



shown that even ultimately strong aliasing has only a negli- 
gible effect on the resulting A, when W > 10. 

In the case of non-sinusoidal signals, an important 
source of the errors of the approximation (|38p comes from 
another side. If the signal model contains some quickly vary- 
ing structures, e.g. narrow peaks, the observations may cover 
these structures with insufficient sampling, so the resulting 
approximation for A may appear poor even when ti are per- 
fectly uniform. This effect is important for the von Mises 
periodogram below, for example. 

4.2 Evaluating A directly 

The direct evaluation of the factor A by means of substi- 
tution of (|22l) to (|27|) involves rather unpleasant manipula- 
tions with huge formulae, especially when we work in general 
terms of Section (2] To simplify them, let us write down the 
gradient of the model h over the compound vector of all 
non- linear parameters o) = {/, A, f}: 



dh r„ , , / /I 

7 = ^ = {2ntg^, g^, g^\ . 
Then define 

Qe„,A' = i'Pug), 

T = (7® 7) - Qe„,^Qe^,e„Qe„,a., 
y = (»7) - Qe„,u,Qe^,e„Qe.„,if, 



(39) 



D = (/)-Qe„,A-Qe„,e„Qe„.K, (40) 

where (p-u is the functional base of the linear null model Q . 
Finally, 

We may also offer another evaluation sequence. Let us 
construct the full Fisher information matrix of all the pa- 
rameters involved {O-u, K, oj): 



Q 



T 2 



19 



T 

37 
7 ® 7 



(42) 



Please notice that the triangle braces, standing for the 
weighted summation over the time series, are still here. Now 
let us apply the Cholesky decomposition: Q = LL"^ with L 
being a lower-triangular matrix. Then write down L in the 
same block form as Q in (I42II : 



(43) 




Notice that Ikm-^ and Iu:,k are vectors, and Ik,k is a 
scalar. Obviously, Lg^^g^ is a lower-triangular matrix of the 
Cholesky decomposition for Qe^.e^, and Lt^^^j is another 
lower-triangular matrix. From the matrix L definition and 
from (|4ip we can easily derive two remarkable relations: 



D = li 



(44) 



Moreover, it is clear that VdetG — detLuj,ui/lK,K- There- 



fore, to find the integrand in (|27|) we just need to calculate 
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det L„,„, which is simply equal to the product of its diagonal 
elements, and then divide the result by ifc k- 



Therefore, the final procedure to evaluate V det G is as 
follows. 



(i) Evaluate Q using its definition (|42[) . Its size should be 
d-H+d+ 1. Note that it is important to preserve the ordering 
of the parameters as On, K, uj, while the ordering inside 
and inside tiJ is not important. 

(ii) Perform the Cholesky decomposition of Q. It is very 
quick and numerically stable procedure. 

(iii) On the basis of only diagonal elements of the 
resulting Cholesky matrix L, construct the combination 

(nf=dti''+2 / ('d„+i,<i„+i)''. It is equal to what we seek. 

The quantity V det G must be further numerically integrated 
over the parameters a;, according to (|27|) . Note that this 
includes the integration over the frequency / and over the 
phase A (which is now non-trivial). 

Now, let us limit ourself to the most important practical 
case when the null model involves only a free constant term: 
d-H — 1, ip-H = 1. In this case the matrix Q is considerably 
simplified: 



Q 



1 g 
9 9' 
7 57 



7 

T 

37 
7 ® 7 



(45) 



The round-off errors may destroy the positive-definiteness 
of Q, which is critical for the Cholesky decomposition. To 
reduce this effect, the computational sequence can be trans- 
formed to something similar to the formulae (|13|l and (|14p . 
This will be some hybrid approach to evaluate V det G be- 
tween the two ones that we have already discussed. Namely, 
first we should center the functions involved: 



9c = {g)/{l), 7c = (7)/(l), 
9^9-gc, 7 = 7 - 7c- 
After that, we need to evaluate 

D = if), 

V = (37), 
/3 = 1-gy/D, 

which imply that 
G = {/3®/3)/Z). 



(46) 



(47) 



(48) 



We have no need to evaluate G itself. Instead, we may per- 
form the Cholesky decomposition of (/3 (8> /3) . Dividing the 
product of the diagonal elements of the resulting Cholesky 
matrix by D'''^'^ , we obtain V det G. 

The obious advantage of the direct method to evalu- 
ate the factor A is that it allows for high accuracy limited 
by only round-off and numerical integration errors, not re- 
lying on any approximating assumptions. The disadvantage 
is that it is considerably more slow than in Section [4.11 al- 
though in practice the relevant computation time should be 
comparable to the time of a single evaluation of the peri- 
odogram itselfQ Therefore, this is still much faster than e.g. 
Monte Carlo simulation, where this periodogram has to be 



re-evaluated thousands of times before we reach a reliable 
FAP estimation. 



5 UNKNOWN NOISE LEVEL 

In Section [21 we have assumed that the standard errors Oi 
of observations are known a priori. In practice we often do 
not know them with enough precision. A commonly used 
model is given by — K./wi, where the quantities Wi deter- 
mine the weighting pattern of the time series and the fac- 
tor K remains unconstrai ned a priori. S imilar problem was 
considered in the paper (|Baluevll200"9ah . In this work, the 



model CT, 



' mcas,! 



-I- (Tj was considered with o 



mcas,! 



being 



the 'internal' measurements variances, known a priori, and 
parameter crj being the unconstrained variance of the extra 
'jitter'. In these cases, we cannot calculate the least-squares 
periodogram z[f), since we cannot calculate the values of 
the functions themselves. 

The general approach for solving such problems is based 
on the likelihood ratio test. The logarithm of the likelihood 
function for our observations, which are contaminated by 
random mutually independent Gaussian errors, may be writ- 
ten down (specifically for the hypothesis T-i and K) as 



n,K = 



1 

4e 



+ In cr,^(p) 



-l-const .(49) 



Here the full variances (t^ depend on the extra parameters 
p, which should be estimated from the data together with 
the usual parameters {0-H,6,f) of the model curve. These 
estimations are obtained in result of maximizing the corre- 
sponding likelihood function over all of the parameters to 
be estimated. After that, we could construct the logarithm 
of the likelihood ratio statistic as the maximum (over the 
frequency /) of the likelihood ratio periodogram 



Z{f) = max In /Ik (p, 0k;,/) 
p,Sk. 



max In C-h{p, Oh)- 



(50) 



This function may give the basis for signal detection in the 
general framework, when p is not known a priori. However, 
for the aims of reduction of the statistical bias in p, it is 
better to use the following modifications of the likelihood 
functions and of the likelihood ratio periodogram: 



1 

Z{f) = IK 



+ In (Ti (p) 



max In Cic 



max In Cn 



(51) 



where A^^ = N ~ d-u, Nk. = N - d^, and 'y'u,ic = N-hx/N. 
This mo dification was discussed in details in the paper 
(|Baluev| [2b09a). 

For the popular practical case af = K/wi with Wi known 
a priori, the periodogram Z{f) re presents a dir ect exten- 
sion of the periodogram Z3{f) from (|Baluevll2008l ). It can be 
constructed now as the maximum (over 9) of the non-linear 



^ We will typically evaluate this periodogram on some multidi- 
mensional grid in the space of all the parameters lj, not just on a 



frequency grid like in the LS case. In terms of the computational 
demands, this procedure is roughly equivalent to the numerical 
integration over the same space. 
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function 

C3(0,/) = -^ln 



FAPsingic < Msingic(2) ~ y/qe 



(54) 



(52) 



Note that the ratio C/Xh is independent of «, so that ("3 can 
be calculated regardless the factor k is unknown. 

To assess the statistical significance of the peaks on the 
likelihood ratio periodogram, we need to know the distribu- 
tions of the maxima of Z(f), as previousl. Now we cannot 
just apply the results from the previous sections directly. 
However, we may use the asymptotic large-sample proper- 
ties of the likelihood function. 

To do this we have to assume that the new parame- 
ters p do not introduce any extra pecularities like e.g. extra 
non-identifiability under H. This is normally true. The cal- 
culations involving quadratic Taylor expansion of the like- 
lihood function near the point K — 0, show that in the 
asymptotic N ^ 00 approximation for the quantity looks 
exactly the same as its linear-case definition in (|8]) and (llOf) . 
This means that the joint distribution of rj and of its gradi- 
ent remains asymptotically the same as in the genuine linear 
case. Concequently, all the theory of Section |3] remains valid 
for the likelihood-ratio periodograms as an asymptotic ap- 
proximation for N ^ 00. The same holds for the modified 
likelihood functions and for the modified likelihood ratio pe- 
riodogram (|5ip . since this modification does not introduce 
any change in the asymptotic behaviour. 

For the periodogram 23(7) and linear models of the sig- 
nal, the approximations to t he perio dograms distributions 
for arbitrary A'' are given in (|Baluev| [2008V When A'^ — > 00, 
these approximations indeed rapidly converge to those ob- 
tained for the original least-squares periodogram. This pro- 
vides an independent confirmation of the arguments from 
the last paragraph. Therefore, for large datasets, we can ap- 
ply the analytic estimation of the FAP for the periodogram 
Z3(/) and other non-linear likelihood ratio periodograms in 
the same way as it was described in the previous sections for 
least-squares periodograms with known noise uncertainties. 



6 PRACTICAL EXAMPLES OF NON-LINEAR 
PERIODOGRAMS 

6.1 Detecting periodic signal with a fixed 
non-sinusoidal shape 

Let us consider the case d — 2 with incorporating only the 
amplitude and phase of the periodic signal to be detected. 
That is. 



f^ = Kg{2nft + X), 



(53) 



where the 27r-periodic function g{x) is given a priori and is 
centred so that 'g — 0. This function determines the shape 
of the putative periodic variation. 

Notice that the term Mboundary in (|17|) may now only 
appear due to the boundary points of the frequency segment. 
The phase A is a periodic parameter defined over the self- 
closed circle which basically does not have a boundary. In 
other words, all maxima of 77 over A are local maxima where 
drj/dX — 0; no other maxima are possible in A. 

In this simple case we express FAP separately for the 
fixed-frequency and unknown-frequency cases, using the ap- 
proach of Section 14.11 In the fixed frequency case, we find 



In the case of unknown frequency. 



FAP„ 



< AW (2) + Msi„gie(2) ^ qWe-'^yG + ^e-".(55) 



The term Mboundary IS equal to Mgingic here, since we have 
two end points of the segment [0, /max] and each should be 
counted as half0 This term can be safely neglected in (|55|l 
anyway. 

Notice that as long as the approximation of the uniform 
phase coverage is valid, the matrix G appears here almost 
constant, so that we can use the simplified formulae (|24p . 
This means that although in the right-hand side of (|55p we 
omitted a term of the order of e~'' /^/z, corresponding to 
the term with ai of (|20p . the coeffici ent ai is itse lf negli- 
gibly small. As we have discussed in (|Baluevll200^ ) for the 
sinusoidal model, the approximation of the uniform phase 
coverage works well even for time series with ultimately 
strong spectral leakage. This is because the aliasing/leakage- 
induced errors are concentrated only within a few narrow 
frequency segments, and they thus have only a negligible ef- 
fect on the quantities expressed by an integral over a large 
frequency band (like A). However, for non-sinusoidal signals 
this approximation may appear poor due to reasons unre- 
lated to the spectral leakage effects; namely it may be poor 
when g{x) demonstrates narrow peaks that our observations 
are unable to cover with enough dense sampling. 

In the general case q ^ 1. This inequality can be clearly 
derived by means of applying the Parseval identity to the 
Fourier series for g{x) and g'ix). When jr is a harmonic 
function, we deal with the LS periodogram. In this case, 
^2 _ gii — 112, and q = 1. This allows us to entirely 
reproduce the exponential single value distribution of the 
LS periodogram, FAPsingio = e~^, and the Davies boun d 
FAPmax < We-^^-h e"^ from the paper (|Baluevll2008l ). 
When g{x) contains at least two Fourier terms we have 
g > 1, so the minimum g = 1 is attained for the sinusoidal 
and only sinusoidal variation. 

Just as a non-trivial example, let us consider the case 
when g{x) has a sawtooth shape: during the first half of its 
period it decreases linearly from 1 to —1 and during the 
second half it increases linearly from —1 to 1. In this case, 
= 4/7r^ g2 ^ 1/3^ and g = 12/ti'^ ^ 1.216. 



6.2 The von Mises periodogram 

Let us assume the following non-linear model for the periodic 
signal: 



g{x,v) 



exp(!/ cos x) 



v>Q. 



(56) 



In this definition Jo stands for the modified Bessel function; 
we need it to satisfy the condition 'g = Q. We can see that 
this function is 27r-periodic m x. At v = Q we have a formal 
singularity because g[x, 0) = and (|10p becomes degenerate. 
We can easily remove this degeneracy by making a replace 

q(x, v) V ( 2 1^ 

= cos T 4- — I ens X — — 

2. 



~g{x,u) 



This is because there is a 50/50 chance that such a boundary 
value is actually a boundary minimim rather than a maximum, 
depending on the sign of the derivative in this end point. 
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(58) 



Figure 2. The graphs of the von Mises function g{x, v) for v = 
(the sinusoid), = 3, and v = 100 (the most peaky case). All plots 
are prescaled to always cover the range [0, 1] in the abscissa. 



K = Kv, (57) 

so that for — >■ our model becomes equivalent to a simple 
sinusoid. For large v the function (|56|l represents a comb-like 
sequence of periodic narrow peaks, each having the width 
~ l/y/u. Note that a very similar function with a bit dif- 
ferent scaling, exp(i/ cos a;)/7o(:^), represents a probability 
density function of the so-called von Mises distribution. This 
distribution is a periodic analog of the Gaussian one, posess- 
ing a similar maximum-entropy property on a circle. For the 
sake of convinience we will also call H56p as the von Mises 
function, since these small differences in the centering and 
normalization are not very important for us. 

In Fig. [2] we plot the von Mises function for several val- 
ues of the localization parameter u. Looking at these plots, 
we might notice that such shape may provide a satisfac- 
tory generic approximation to many physical variabilities 
that emerge in the astronomical practice. In particular, it 
may appear good for the lightcurves of many variable stars 
and planetary transits (just turn these graphs upside-down 
to optain something similar to a transit). Since such model 
is functionally very simple (and thus quickly calculatable 
and easy in various analytic manipulations), it looks rather 
tempting to construct a periodogram that could utilise it as 
a model of the probe periodic signal. 

Assume that we scan this periodogram in a rectangle 
J'min ^ ^ J^max aud ^ / ^ ./max, aud dlsallowiug the 
signal amplitude K to become negative. Then we should 
first evaluate the function r using (|20p for the interior of 
this rectangle (implying n = 3). It will be approximately 
proportional to W . Also, we should evaluate the function 
A/boundary, which coutaius the term responsible for the four 
sides and four vertices of the mentioned rectangle. Among 
these, only the terms due to the sides ^ — i^min,max are im- 
portant. This is becuase these are the only boundary terms 
proportional to W . The boundary terms due to the two other 
sides and due to the vertices do not contain this multiplier, 
because the frequency / is held fixed there. Since in practice 
W is large or very large, we only need to take into account 
the boundary edges running along the frequency axis. 

The final approximation to the false alarm probability 
can be represented in the form 



where 



J max f ^TT 





/max 27r 



(59) 



In (|58p . the terms involving the factor X correspond 
to the local maxima of 77 in the interior of the rectangle. 
The difference X(i-'max) — (i^min) is because the factor A 
now contains an integral from i^min to I'max, while the func- 
tion X(u) is defined as an integral from to v. The terms 
with Y are for maxima on the two boundary lines v = fmin 
and V — fmax- The sum F(i^min) + ^(i^max) is because we 
need to sum the maxima at the both borders, and the ex- 
tra multiplier of 1/2 is because we should filter out half of 
these boundary maxima, due to the derivative drj/dv hav- 
ing at these maxima an inappropriate sign with probability 
1/2. The quantities WXiy) and WY^v) represent, in fact, 
the factor A for the rectangle [0, u] x [0, /max] and for the 
boundary segments {v = i^min, t'max} x [0, /max]- 

In (|59p. the 3x3 matrix Gf\v corresponds to the full 
gradient of rj over all three non- linear parameters /, A, 
the matrix G/a is a 2 x 2 submatrix of G/ai^ involving only 
the elements corresponding to only the parameters / and A. 
Both these matrices depend on all three parameters. Notice 
that the terms in (|58|l containing Y basically correspond to 
the signal having a fixed non-sinusoidal shape (i/ is fixed), 
and thus they can be also treated using the formalism of 
Sect. |6T] 

In the particular case Vmi-n ~ (with the sign of K 
still fixed) we should take into account the obvious equality 
X{0) = 0; 



FAP(2) < M{z) = We' 



zX(l/max) + 



-(n^max) + y(0))^+O(2°) 



(60) 



FAP(z) < M{z) = We' 



(X(l/max) - X{Umin))z + 



Notice that in practice Y{0) ~ 1 with good precision (see 
below) . 

When K is allowed to be positive as well as negative we 
should double the right-hand side of the expression in ((58}. 
This is because we should now honour the local minima of 
the random field 77 too, as well as its local maxima. We have 
in this case: 

FAP(z) < M{z) = We-' [2(X(!.max) - X{u^in))z + 

+ {Y{l^n.in)+Y{Un..^))^+0{z")] , (61) 

A special case occures when Vmin ~ and the sign of 
K is arbitrary. Then we have, basically, some degeneracy of 
the free variables at v — 0, making the cases K < and 
K > equivalent to each other (due to the symmetry of the 
sinusoid). This property can be used to refine the Rice bound 
a bit. Assume that we have some point at the boundary 
1^ = 0, such that 77 > (implying K > 0) and drj/dv < 0. It 
is easy to show that at a dual point with A A-f tt the value 
of ?7 changes the sign (hence K < 0), but the value of drj/dv 
remains exactly the same. This is because the derivative of g 
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in (|57p over z/ is a 7r-periodic function of x (at i/ = 0), while 
the model g itself is 27r-periodic. Therefore, the derivative 
d\ri\/dv has different sign in these dual points, while the 
value of \ri\ is identical. 

Therefore, although there are many "good" boundary 
maxima at J/ = 0, which satisfy the condition d\ri\/di/ < 
(meaning that |?7| necessarily decreases when we step from 
— inwards), each such a maximum has a dual "bad" 
maximum at A + vr, where d\ri\/dv > 0. Since there are no 
constraints on A, this means that for any boundary maxi- 
mum at i/ = 0, either "good" or "bad", we can find larger 
values of rj in the interior u > 0. This implies that the global 
maximum ot \rj\ cannot b e attained at the line u = 0. Fol- 
lowing the terminology bv lAzai's fc Delrnai l|2002h (see their 
Theorem 2), we basically proved that the field ^ (or I?;]) is 
a "field without boundary", concerning the boundary line 
u = 0. This allows us to just drop the relevant boundary 
term with F(i^min = 0): 

FAP(z) < M{z) = We-" [2zX(i.n,ax) + Y{Un..^)^+0{z°)] , (62) 

Notice that it is essential here that K has arbitrary sign, 
because otherwise we could not freely swap the values 77 > ■-. 
with rj < 0. > 

Let us first apply the approach of the Section 14.11 to 
evaluate X and Y. The derivatives of the function (|56p look 
like 



g'x = — i^exp(i'cosa;) sina:, 
gl — exp(!/ cos x) cos a; — /i (;/) 



(63) 



Substituting them to (|32}, and using the well-known inte- 
gral representations for the modified Bessel functions 1^, we 
obtain 



g^ = Io{2y) - 



= ^ [/o(2m) 



gi'' = -[Io{2u)+l2{2u)]-lt{,y), 



= 0, 



9x91 



(64) 



and then 



det R = V 



^'/o(2t 



2 h{2u)-mp) 
_ h[2u) + l2{2v) 



21? (u) 



2[Io{2t.) - mu)] 
h(2v) - h{v)h{v) 



Jo(2i.) -/,?(;.) 



(65) 



The behaviour of these quantities is not obvious from these 
formulae, so we need to understand their asymptotic be- 
haviour. For small v we can use the Tailor expansion of 
Ik{z) to find that 17(0) = 1 and V(0) = 1/1 6. This implies 
that for the factor X the integrand gV det R is equal to 1/4 
at = 0; for the factor Y we have det R = 1 by definition 
and det R a.t v = Q is unit. For large v we may use the 
following asymptotically converging expansion: 



4fc^ - 1 (4fc^ - l)(4fc -9) 



3.5 



2.5 



-> 2 
X 

1.5 



0.5 



5 10 15 20 25 30 
localization parameter v 



n-Kz \ 8z 2!(8z)2 

which can be found e.g. in (|Korn fc Kornll 19681 . §21.. 
calculations lead us to 



.(66) 



The 



V ■ 



1 




10 15 20 
localization parameter v 



30 



Figure 3. The coefficients X and Y for tlie von Mises peri- 
odogram. Dasiied curve is for the approximation of Section |4. II 
while the three solid curves correspond to the direct precise 
method of Section 14.21 These three curves (from down to up) 
correspond to three simulated time series, containing N = 30, 
100, and 1000 randomly distributed simulated observations. For 
all cases we have W ~ 100. 



gVdet R : 



(for X), 



Wdet R : 



(for Y). (67) 



AV2 " " " 2 
When calculating X, we should further integrate the rele- 
vant quantity over v, so we obtain X{i') ~ v for large v. The 
factor Y does not request this integration, but its asymp- 
totics appears eventually the same, Y{i') ~ 1/ for large u. 
Also, for large v we have Y{h')/X{i') ~ 2\/2tt, hence the 
"primary" X-term in (|58|l really exceeds the V-term only 
for z > 27r, and even beyond this level they remain mutu- 
ally comparable up to rather large z. This means that both 
these terms should be taken into accound in practice; none 
can be neglected. 

These also results infer that the integral in (|38|l is infi- 
nite if we do not limit u from the upper side. In this regard 
the parameter u is similar to the frequency /. When we scan 
the usual periodogram in a wider frequency range, the prob- 
ability to find a high noisy peak in this range inevitably in- 
creases. Similarly, an attempt to detect more quickly- varying 
signals with larger u will inevitably increase our chances to 
catch a noisy fluctuation instead of a true signal. 

We find that the approximate expressions for the factors 
X and Y obtained using the formalism of Section |4T] a very 
accurate for small but this accuracy decreases when v 
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Figure 4. The FAP approximation of the von Mises periodogram in comparison with Monte Carlo simulations. In each panel we show 
the simulated and analytic FAP curves for i^min = and six different values of I'max from to 315. In the graphs we mark, instead of 
the upper limit for u, a lower limit for a more intuitive FWHM (Full Width at Half Maximum) characteristic of the signal peaks. It can 
be easily mapped one-to-one with i^max and it varies here from 1/2 (corresponding to the sinusoidal variation) to approximately 1/50 
(typical for e.g. a planetary transit). The panels to the left correspond to the cases with the noise uncertainties cTi known a priori; the 
ones to the right are for the multiplicative noise model of Sect. [5] cr^ oc l/ui^. In the case of "clumped random timings" (right-bottom 
panel) the N = 100 points of the time series were equally split in 10 equidistant groups with 90% gaps between them. This implies a 
very strong aliasing, which makes our analytic approximation for FAP relatively inaccurate, though they still work as an upper limit. 



grows, and increases when A'^ grows (Fig. |3j. We assume 
that the error of this approximation emerges because our 
observations cannot sample well the narrow peaks of the 
signal having the width ~ l/v^- 

We have done some Monte Carlo simulations to test the 
quality of the approximation (|6Up . The results are shown in 
Fig. |4l We find that our analytic approximations behave 
exactly as we might expect. They have good accuracy for 
practically important small levels of the FAP and for not 
too large i/max and time-series leakage. For larger Vma.x or 
for strong leakage the accuracy somewhat degrades, but the 
formula ()60p still works as an upper bound on FAP. Our 
conclusion is that this formula would be certainly useful in 
practical applications. 

At last we would like to demonstrate the power of the 
von Mises periodogram itself. We generated a simulated 
time series with A'^ = 1000 randomly spaced observations. 
The values of the simulated measurements contained two 
periodic signals with comparable amplitudes: a sinusoidal 
variation and periodic flat drops (simulating planetary tran- 
sits). Both signals were below the noise level, so the noise 
should provide significant contamination. The von Mises pe- 
riodogram of these data is plotted in Fig. [5] We can see that 
it allows an easy detection of the both signals at once (/ « 8 
and / ~ 56), while the LS periodogram (wich represents, ba- 
sically the middle horizontal slice of the plot) would robustly 



reveal only the sinusoidal periodicity, allowing the planetary 
transit to slip away until the next step of the analysis. 



7 CONCLUSIONS 

In this pap er, we extended our previous results (|Baluevl 
l2008l . l2009bl ) to the case when the model of the signal to be 
detected in the noisy data depends on unknown parameters 
in a non-linear manner. The definition of the periodogram 
was extended to this non-linear (and non-sinusoidal) case 
in terms of the a-nd likelihood-ratio tests. We described 
a generic method of constructing an asymptotic approxi- 
mation to the false alarm probability. Based on these gen- 
eral results, we considered two specialized non-linear peri- 
odograms. The first one involves a fixed-shape periodic non- 
sinusoidal model of the signal. The second one models the 
signal with the so-called von Mises function exp(// cos a;). 
This function is very remarkable, because it allows fairly 
good approximation of very different periodic variations, 
from the plain sinusoid to a model with periodic narrow 
peaks or drops (typical for e.g. the exoplanetary transit 
lightcurve). For both these periodograms we provide a com- 
plete theoretical solution of the false alarm probability prob- 
lem. 

Moreover, for the von Mises periodogram we offer a sup- 
porting package of CH — h programs, that may dramatically 
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Figure 5. The von Mises periodogram plotted for a test time 
series containing two periodic signals contaminated by noise (see 
text). The lower half (FWHM < 0.5) of the graph correspond to 
a drop-type periodicity, while the upper part (for which we made 
a symbolic replacement FWHM ^ 1 — FWHM for convenience, 
so the labelled values FWHM exceed 0.5) is for a peak- type vari- 
ation. The slice at FWHM = 0.5 represents the LS periodogram 
assuming the sinusoidal model of the signal. 

faciliate the use of the relevant theory in practice. This pack- 
age is attached to the present paper as the online-only sup- 
porting material (as a compressed archive). 

We expect that the results of this work can be used in a 
wide variety of astronomical applications that deal with non- 
sinusoidal periodicities in observational data. These research 
fields are ranged from the studies of variable stars to the 
studies of extrasolar planetary systems. 

In the forthcoming and future work, we plan to ap- 
ply our approach to the so-called double-frequency peri- 
odogram, where the signal is modelled by a sum of two inde- 
pendent sinusoidal terms (to appear in Astron. Lett., under 
review), to the Schuster periodogra m and to the so- called 
Keplerian periodogram introduced by'Cummine ('2OO4I), also 
known as "2DKLS periodogram" (|0'Toolc ct al...2009.) . 
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